ggtree: Elegant Graphics for Phylogenetic Tree Visualization and Annotation

您所在的位置:网站首页 visualize object and to figure ggtree: Elegant Graphics for Phylogenetic Tree Visualization and Annotation

ggtree: Elegant Graphics for Phylogenetic Tree Visualization and Annotation

2024-07-14 11:50| 来源: 网络整理| 查看: 265

Chapter 4 Visualization and annotation of phylogenetic trees: ggtree 4.1 Introduction

There are many software packages and webtools that are designed for displaying phylogenetic trees, such as TreeView (Page 2002), FigTree7, TreeDyn (Chevenet et al. 2006), Dendroscope (Huson and Scornavacca 2012), EvolView (Z. He et al. 2016) and iTOL (Letunic and Bork 2007), etc.. Only several of them, such as FigTree, TreeDyn and iTOL, allow users to annotate the trees with coloring branches, highlighted clades and tree features. However, their pre-defined annotating functions are usually limited to some specific phylogenetic data. As phylogenetic trees become more widely used in multidisciplinary studies, there is an increasing need to incorporate various types of the phylogenetic covariates and other associated data from different sources into the trees for visualizations and further analyses. For instance, influenza virus has a wide host range, diverse and dynamic genotypes and characteristic transmission behaviors that are mostly associated with the virus evolution and essentially among themselves. Therefore, in addition to standalone applications that focus on each of the specific analysis and data type, researchers studying molecular evolution need a robust and programmable platform that allows the high levels integration and visualization of many of these different aspects of data (raw or from other primary analyses) over the phylogenetic trees to identify their associations and patterns.

To fill this gap, we developed ggtree, a package for the R programming language (R Core Team 2016) released under the Bioconductor project (Gentleman et al. 2004). The ggtree is built to work with phylogenetic data object generated by treeio, and display tree graphics with ggplot2 package (Wickham 2016) that was based on the grammar of graphics (Wilkinson et al. 2005).

The R language is increasingly being used in phylogenetics. However, a comprehensive package, designed for viewing and annotating phylogenetic trees, particularly with complex data integration, is not yet available. Most of the R packages in phylogenetics focus on specific statistical analyses rather than viewing and annotating the trees with more generalized phylogeny-associated data. Some packages, including ape (Paradis, Claude, and Strimmer 2004) and phytools (Revell 2012), which are capable of displaying and annotating trees, are developed using the base graphics system of R. In particular, ape is one of the fundamental package for phylogenetic analysis and data processing. However, the base graphics system is relatively difficult to extend and limits the complexity of tree figure to be displayed. OutbreakTools (Jombart et al. 2014) and phyloseq (McMurdie and Holmes 2013) extended ggplot2 to plot phylogenetic trees. The ggplot2 system of graphics allows rapid customization and exploration of design solutions. However these packages were designed for epidemiology and microbiome data respectively and did not aim to provide a general solution for tree visualization and annotation. The ggtree package also inherits versatile properties of ggplot2, and more importantly allows constructing complex tree figures by freely combining multiple layers of annotations using the tree associated data imported from different sources (importation using treeio package is detailed in Chapter 2).

4.2 Methods and Materials

The ggtree package is designed for annotating phylogenetic trees with their associated data of different types and from various sources. These data could come from users or analysis programs, and might include evolutionary rates, ancestral sequences, etc. that are associated with the taxa from real samples, or with the internal nodes representing hypothetic ancestor strain/species, or with the tree branches indicating evolutionary time courses. For instance, the data could be the geographic positions of the sampled avian influenza viruses (informed by the survey locations) and the ancestral nodes (by phylogeographic inference) in the viral gene tree (Lam et al. 2012).

The ggtree supports ggplot2’s graphical language, which allows high level of customization, is intuitive and flexible. It is notable that ggplot2 itself does not provide low-level geometric objects or other support for tree-like structures, and hence ggtree is a useful extension on that regard. Even though the other two phylogenetics-related R packages, OutbreakTools and phyloseq, are developed based on ggplot2, the most valuable part of ggplot2 syntax - adding layers of annotations - is not supported in these packages. For example, if we have plotted a tree without taxa labels, OutbreakTools and phyloseq provide no easy way for general R users, who have little knowledge about the infrastructures of these packages, to add a layer of taxa labels. The ggtree extends ggplot2 to support tree objects by implementing a geometric layer, geom_tree, to support visualizing tree structure. In ggtree, viewing a phylogenetic tree is relatively easy, via the command ggplot(tree_object) + geom_tree() + theme_tree() or ggtree(tree_object) for short. Layers of annotations can be added one by one via the + operator. To facilitate tree visualization, ggtree provides several geometric layers, including geom_treescale for adding legend of tree branch scale (genetic distance, divergence time, etc.), geom_range for displaying uncertainty of branch lengths (confidence interval or range, etc.), geom_tiplab for adding taxa label, geom_tipoint and geom_nodepoint for adding symbols of tips and internal nodes, geom_hilight for highlighting a clade with rectangle, and geom_cladelabel for annotating a selected clade with a bar and text label, etc.. A full list of geometric layers provided by ggtree are summarized in Table 4.1.

4.3 Results 4.3.1 Features

The ggtree supports displaying phylogram and cladogram (Figure 4.1) and can visualize a tree with different layouts, including rectangular, slanted, circular, fan, unrooted, time-scale and two-dimensional tree.

The ggtree allows tree covariates stored in tree object to be used directly in tree visualization and annotation. These covariates can be meta data of the sampling species/sequences used in the tree, statistical analysis or evolutionary inferences of the tree (e.g. divergence time inferred by BEAST or ancestral sequences inferred by HyPhy, etc.). These numerical or categorical data can be used to color branches or nodes of the tree, displayed on the tree with original values or mapping to different symbols. In ggtree, users can add layers to highlight the selected clades, and to label clades or annotate the tree with symbols of different shapes and colors, etc. (more details in Section 3.3.3).

Comparing to other phylogenetic tree visualizing packages, ggtree excels at exploring the tree structure and related data visually. For example, a complex tree figure with several annotation layers can be transferred to a new tree object without step-by-step re-creation. An operator, %



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3